Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pose context length ext #1567

Merged
merged 8 commits into from
Apr 27, 2024
Merged

Pose context length ext #1567

merged 8 commits into from
Apr 27, 2024

Conversation

winglian
Copy link
Collaborator

PoSE paper: https://huggingface.co/papers/2309.10400
Model: https://huggingface.co/winglian/Llama-3-8b-64k-PoSE
YAML: https://huggingface.co/winglian/Llama-3-8b-64k-PoSE/blob/main/axolotl/pose.yaml

Add the PoSE technique for extending context length without needing long context data.

Copy link

@arkapal3 arkapal3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure your PR doesn't quite work for chunks > 2.

i for i, token_id in enumerate(input_ids) if token_id in split_on_token_ids
]
else:
split_indices = [sample_len // chunks]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is going to work for any n_chunks > 2 right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, you're right

@winglian winglian merged commit 5294653 into main Apr 27, 2024
7 checks passed
@winglian winglian deleted the pose-context-length-ext branch April 27, 2024 16:28
djsaunde pushed a commit that referenced this pull request Dec 17, 2024
* PoSE wip

* fixes for pose splitting

* set pose context len so we can pick that up seperately from the usable training context len

* support min sample len and define num chunks

* fix chunk splitting

* support for curriculum/ordered learning with pose

* fix sequence len sort

* add curriculum_sampling to pydantic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants